Learn R Programming

pmclust (version 0.1-4)

EM-like algorithms: EM-like Steps for SPMD

Description

The EM-like algorithm for model-based clustering of finite mixture Gaussian models with unstructured dispersions.

*.dmat's are ddmatrix versions.

Usage

em.step(PARAM.org)
  aecm.step(PARAM.org)
  apecm.step(PARAM.org)
  apecma.step(PARAM.org)
  kmeans.step(PARAM.org)
  kmeans.step.dmat(PARAM.org)

Arguments

PARAM.org
an original set of parameters generated by set.global.

Value

  • A convergent results will be returned the other list variable containing all new parameters which represent the components of models. See the help page of PARAM or PARAM.org for details.

Details

A global variable called X.spmd should exist in the .pmclustEnv environment, usually the working environment. The X.spmd is the data matrix to be clustered, and this matrix has a dimension N.spmd by p.

A PARAM.org will be a local variable inside all EM-linke functions em.step, aecm.step, apecm.step, apecma.step, and kmeans.step, This variable is a list containing all parameters related to models. This function also updates in the parameters by the EM-like algorithms, and return the convergent results. The details of list elements are initially generated by set.global.

References

High Performance Statistical Computing (HPSC) Website: http://thirteen-01.stat.iastate.edu/snoweye/hpsc/

Programming with Big Data in R Website: http://r-pbd.org/

Chen, W.-C. and Maitra, R. (2011) Model-based clustering of regression time series data via APECM -- an AECM algorithm sung to an even faster beat, Statistical Analysis and Data Mining, 4, 567-578.

Chen, W.-C., Ostrouchov, G., Pugmire, D., Prabhat, M., and Wehner, M. (2013) Exploring Multivariate Relationships in Large Spatio-Temporal Data with Parallel Model-Based Clustering and Scalable Graphics, Technometrics, (revision).

Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society Series B, 39, 1-38.

Lloyd., S. P. (1982) Least squares quantization in PCM, IEEE Transactions on Information Theory, 28, 129-137.

Meng, X.-L. and Van Dyk, D. (1997) The EM Algorithm.an Old Folk-song Sung to a Fast New Tune, Journal of the Royal Statistical Society Series B, 59, 511-567.

See Also

set.global, mb.print, set.global.dmat.

Examples

Run this code
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)
comm.set.seed(123)

### Generate an example data.
N.allspmds <- rep(5000, comm.size())
N.spmd <- 5000
N.K.spmd <- c(2000, 3000)
N <- 5000 * comm.size()
p <- 2
K <- 2
data.spmd <- generate.basic(N.allspmds, N.spmd, N.K.spmd, N, p, K)
X.spmd <- data.spmd$X.spmd

### Run clustering.
PARAM.org <- set.global(K = K)          # Set global storages.
# PARAM.org <- initial.em(PARAM.org)    # One initial.
PARAM.org <- initial.RndEM(PARAM.org)   # Ten initials by default.
PARAM.new <- apecma.step(PARAM.org)     # Run APECMa.
em.update.class()                       # Get classification.

### Get results.
N.CLASS <- get.N.CLASS(K)
comm.cat("# of class:", N.CLASS, "\n")

### Quit.
finalize()

Run the code above in your browser using DataLab